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(2) RELATED APPEALS AND INTERFERENCES 
Appellants, their legal representative and the assignee are not aware of any related 
appeals or interferences which will directly affect or be directly affected by or have a bearing on 
the Board's decision in the instant appeal. 



Claims rejected: 
Claims allowed: 
Claims canceled: 
Claims withdrawn: 
Claims on Appeal: 



(3) STATUS OF THE CLAIMS 
Claims 2, 3, 13-15 and 21. 
Claim L 
Claims 16-20. 
Claims 4-12. 

Claims 2, 3, 13-15 and 21 (A copy of the claims on appeal, as 
amended, can be found in the attached Appendix). 



(4) STATUS OF AMENDMENTS AFTER FINAL 
Claim 21 was amended in the Response to Final Office Action filed on July 1, 2003 to 
address a rejection raised by the Examiner, and in the Response to Final Office Action filed 
October 21, 2003 to address an objection raised by the Examiner. For purposes of appeal, the 
proposed amendment will be entered. See Advisory Action, mailed March 2, 2004. 

(5) SUMMARY OF THE INVENTION 
Appellants' invention is directed to a composition comprising a plurality of 
polynucleotides, SEQ ID NOs: 1-13, and their encoded polypeptides, SEQ ED NOs: 14 and 15, 
that are highly coexpressed with known genes that regulate, respond to, or participate in insulin 
synthesis. The polynucleotides and their encoded proteins are therefore asserted to be useful in 
the diagnosis, prognosis, and treatment of pancreatic disorders. See specification, at page 4, lines 
6-8. SEQ ID NO:2 is further disclosed as only expressed in pancreatic tissues by Northern 
analysis and, furthermore, was differentially expressed in type I diabetes, greater than five-fold 
relative any other normal or diseased pancreatic tissue. See specification, at page 28, line 28 
through page 29, line 9. This data therefore provides confirmation of the strength of 
coexpression analysis-the use of known genes to identify unknown polynucleotides and their 
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encoded proteins which are highly significantly associated with insulin synthesis and pancreatic 
disorders. See specification, at page 5, lines 9-12. 

(6) ISSUES 

1. Whether claim 21 directed to polynucleotides encoding the polypeptides of SEQ 
ID NO: 14 and 15, and variants thereof, meet the written description requirement of 35 U.S.C. 

§ 1 12, first paragraph. In particular, whether all polynucleotides encoding the polypeptides of 
SEQ ED NOs: 14 and 15 as well as polypeptides having at least 95% identity to SEQ ED NO: 15 
are sufficiently described in the specification in such a way as to reasonably convey to one skilled 
in the relevant art that the inventor(s), at the time the application was filed, had possession of the 
claimed invention. 

2. Whether claims 2, 3, 13-15, and 21 are patentable over claims 1, 2, 8-10, and 13 
of U.S. Patent No. 6,566,066 under the doctrine of obviousness-type double patenting pending a 
timely filed terminal disclaimer for the instant application in compliance with 37 CFR 1.321(c). 

(7) GROUPING OF THE CLAIMS 

As to Issue 1 

Claim21 stands alone. Claims 2, 3, and 13-15 are grouped together. 
As to Issue 2 

All of the claims on appeal are grouped together. 

(8) APPELLANTS' ARGUMENTS 

The rejection of claim 21 is improper, as the invention of that claim is sufficiently 
described in the specification that one of skill in the relevant art would recognize 
applicant's possession of them at the time the application was filed. 

Claim 21 stands rejected under 35 U.S.C. § 112, first paragraph, based on the allegation 
that the claimed invention was not sufficiently described in the specification in such a way as to 
reasonably convey to one skilled in the relevant art that the inventor(s), at the time the 
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application was filed, had possession of the claimed invention. The rejection alleges in particular 
that: 

• claim 21 embraces all polynucleotides that encode SEQ ED NOs: 14 and 15 and all 
polynucleotides that encode variants of SEQ ID NOs: 14 and 15 that are 95% or more 
identical to SEQ ID NOs: 14 and 15. Applicants have not pointed to [any] basis in the 
application as filed for the full breadth of the claim 

• Applicants arguments (paper filed August 25, 2003 at page 5) are most unconvincing 
because no basis is seen for claim 21. Neither page 3 nor page 23 of the instant 
application mentions all sequences that encode SEQ ID NOs: 14 and/or 15. Thus the 
application does not contemplate all nucleic acids that encode SEQ ID NOs: 14 and 15, 
but only cDNAs. 

Applicants have pointed to support in the specification for a polynucleotide " encoding a 
polypeptide having and amino acid sequence of SEQ ED NO: 14 or SEQ ED NO: 15" as recited 
specifically at page 23, lines 29-30 of the specification; ( "SEQ ED NOs: 14 and 15 of the present 
invention were encoded by SEQ ED NOs: 1 and 8, respectively"), and for "a naturally occurring 
variant having at least 95% sequence identity to the amino acid sequence of SEQ ED NO: 14 or 
SEQ ID NO: 15" at page 3, line 20; (naturally occurring protein), and at page 3, lines 29-30; ("A 
variant" refers to a polynucleotide or protein whose sequence diverges from about 5% to about 
30% from the nucleic acid or amino acid sequences of the Sequence Listing"), which clearly 
encompasses a variant "having at least 95% identity" to SEQ ID NO: 14 or SEQ ED NO: 15. 

With respect to the Examiner's allegation that the specification fails to mention aU 
sequences that encode SEQ ED NOs: 14 and 15, applicants have argued that the PTO has 
determined that the identification of all polynucleotide sequences encoding a disclosed protein 
sequence is routine in the art based on the well known genetic code . See Guidelines for 
Examination of Patent Applications Under 35 U.S.C. 112, f 1, "Written Description" 
Requirement, Federal Register, Vol. 66, No. 4, Friday January 5, 2001, page 1 102. 

Therefore, applicants submit that claim 21 reciting an isolated polynucleotide encoding a 
polypeptide having an amino acid sequence of SEQ ID NO: 14 or 15 is adequately described in 
the specification based on the disclosure in the specification at page 23 of a species of 
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polynucleotide of SEQ ID NOs: 1 and 8, encoding the amino acid sequences of SEQ ED NO: 14 
and 15, respectively, and the general knowledge in the art concerning the genetic code. 

However, if the Examiner's rejection is based solely on applicants definition of a 
polynucleotide" as a "cDNA". Applicants would consider further amending the claim to recite 
"a cDNA encoding ..." to obviate this part of the rejection. 

With respect to polynucleotides encoding variants of SEQ ED NO: 15, appellants submit 
that these are adequately described in chemical and structural terms that one skilled in the 
relevant arrt would recognize applicants possession of them at the time the application was filed. 

The requirements necessary to fulfill the written description requirement of 35 U.S.C. 

112, first paragraph, are well established by case law. 

... the applicant must also convey with reasonable clarity to those skilled 
in the art that, as of the filing date sought, he or she was in possession of the 
invention. The invention is, for purposes of the "written description" inquiry, 
whatever is now claimed Vas-Cath, Inc. v. Mahurkar, 19 USPQ2d 1111, 1117 
(Fed. Cir. 1991) 

Attention is also drawn to the Patent and Trademark Office's own "Guidelines for 

Examination of Patent Applications Under the 35 U.S.C. Sec. 112, para. 1", published January 5, 

2001, which provide that : 

An applicant may also show that an invention is complete by disclosure of 
sufficiently detailed, relevant identifying characteristics which provide evidence 
that applicant was in possession of the claimed invention, i.e., complete or partial 
structure, other physical and/or chemical properties, functional characteristics 
when coupled with a known or disclosed correlation between function and 
structure, or some combination of such characteristics. What is conventional or 
well known to one of ordinary skill in the art need not be disclosed in detail. If a 
skilled artisan would have understood the inventor to be in possession of the 
claimed invention at the time of filing, even if every nuance of the claims is not 
explicitly described in the specification, then the adequate description requirement 
is met. 

Thus, the written description standard is fulfilled by both what is specifically disclosed 
and what is conventional or well known to one skilled in the art. 

SEQ ID NO:l and SEQ ID NO: 14 are specifically disclosed in the application (see, for 
example, page 23, lines 29-30). Variants of SEQ ID NO: 14 are described, for example, at page 
3, lines 29-30. Chemical and structural features of SEQ ID NO: 14 are described, for example, on 
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page 24, lines 16-27. Given SEQ ED NO: 14, and the various chemical and structural features 
described for SEQ ID NO: 14, one of ordinary skill in the art would recognize naturally-occurring 
variants of SEQ ID NO: 14 having at least 95% sequence identity to SEQ ID NO: 14. 
Accordingly, the Specification provides an adequate written description of the recited 
polypeptide sequences. 

A. The Specification provides an adequate written description of the claimed 
"variants 11 of SEQ ID NO:14. 

The Office Action has asserted that the claims are not supported by an adequate 

written description because 

the claimed invention was not sufficiently described in the specification in such a 
way as to reasonably convey to one skilled in the relevant art that the inventor(s), 
at the time the application was filed, had possession of the claimed invention. 

(page 2 of the Final Office Action of July 1 , 2003) 

Such a position is believed to present a misapplication of the law. 

1. The present claims specifically define the claimed genus through the 
recitation of chemical structure 

Court cases in which "DNA claims" have been at issue (which are hence relevant to 

claims to proteins encoded by the DNA and antibodies which specifically bind to the proteins) 

commonly emphasize that the recitation of structural features or chemical or physical properties 

are important factors to consider in a written description analysis of such claims. For example, in 

Fiers v. Revel, 25 USPQ2d 1601, 1606 (Fed. Cir. 1993), the court stated that: 

If a conception of a DNA requires a precise definition, such as by structure, 
formula, chemical name or physical properties, as we have held, then a description 
also requires that degree of specificity. 

In a number of instances in which claims to DNA have been found invalid, the courts 
have noted that the claims attempted to define the claimed DNA in terms of functional 
characteristics without any reference to structural features. As set forth by the court in University 
of California v. Eli Lilly and Co. , 43 USPQ2d 1398, 1406 (Fed. Cir. 1997): 
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In claims to genetic material, however, a generic statement such as "vertebrate 
insulin cDNA" or "mammalian insulin cDNA," without more, is not an adequate 
written description of the genus because it does not distinguish the claimed genus 
from others, except by function. 

Thus, the mere recitation of functional characteristics of a DNA, without the definition of 
structural features, has been a common basis by which courts have found invalid claims to DNA. 
For example, in Lilly, 43 USPQ2d at 1407, the court found invalid for violation of the written 
description requirement the following claim of U.S. Patent No. 4,652,525: 

1. A recombinant plasmid replicable in procaryotic host containing within its 
nucleotide sequence a subsequence having the structure of the reverse transcript of 
an mRNA of a vertebrate, which mRNA encodes insulin. 

In Fiers, 25 USPQ2d at 1603, the parties were in an interference involving the following 

count: 

A DNA which consists essentially of a DNA which codes for a human fibroblast 
interferon-beta polypeptide. 

Party Revel in the Fiers case argued that its foreign priority application contained an 
adequate written description of the DNA of the count because that application mentioned a 
potential method for isolating the DNA. The Revel priority application, however, did not have a 
description of any particular DNA structure corresponding to the DNA of the count. The court 
therefore found that the Revel priority application lacked an adequate written description of the 
subject matter of the count. 

Thus, in Lilly and Fiers, nucleic acids were defined on the basis of functional 
characteristics and were found not to comply with the written description requirement of 35 
U.S.C. §1 12; Le. 9 "m mRNA of a vertebrate, which mRNA encodes insulin" in Lilly, and "DNA 
which codes for a human fibroblast interferon-beta polypeptide" in Fiers. In contrast to the 
situation in Lilly and Fiers, the claims at issue in the present application define polypeptides in 
terms of chemical structure, rather than on functional characteristics. For example, the "variant 
language" of independent claim 21 recites chemical structure to define the claimed genus: 

21. An isolated polynucleotide encoding a polypeptide having an amino acid 
sequence of SEQ ID NO: 14 — , or a naturally occurring variant having at least 
95% sequence identity to the amino acid sequence of SEQ ID NO: 14. 
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From the above it should be apparent that the claims of the subject application are 
fundamentally different from those found invalid in Lilly and Fiers. The subject matter of the 
present claims is defined in terms of the chemical structure of SEQ ID NO 14. In the present 
case, there is no reliance merely on a description of functional characteristics of the polypeptides 
recited by the claims. In fact, there is no recitation of functional characteristics. Moreover, if 
such functional recitations were included, it would add to the structural characterization of the 
recited polypeptides. The polypeptides defined in the claims of the present application recite 
structural features, and cases such as Lilly and Fiers stress that the recitation of structure is an 
important factor to consider in a written description analysis of claims of this type. By failing to 
base its written description inquiry "on whatever is now claimed," the Office Action failed to 
provide an appropriate analysis of the present claims and how they differ from those found not to 
satisfy the written description requirement in Lilly and Fiers 

2, The present claims do not define a genus which is "highly variant" 

Furthermore, the claims at issue do not describe a genus which could be characterized as 
"highly variant." Available evidence illustrates that the claimed genus is of narrow scope. 

In support of this assertion, the Examiner's attention is directed to the enclosed reference 
by Brenner et al. ("Assessing sequence comparison methods with reliable structurally identified 
distant evolutionary relationships," Proc. Natl. Acad. Sci. USA (1998) 95:6073-6078; cited at 
page 19, lines 24-25 of the specification). Through exhaustive analysis of a data set of proteins 
with known structural and functional relationships and with <90% overall sequence identity, 
Brenner et al. have determined that 30% identity is a reliable threshold for establishing 
evolutionary homology between two sequences aligned over at least 150 residues. (Brenner et 
al., pages 6073 and 6076.) Furthermore, local identity is particularly important in this case for 
assessing the significance of the alignments, as Brenner et al. further report that >40% identity 
over at least 70 residues is reliable in signifying homology between proteins. (Brenner et al., 
page 6076.) 
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The present application is directed, inter alia, to proteins related to the amino acid 
sequence of SEQ ID NO: 14. In accordance with Brenner et al, naturally occurring molecules 
may exist which could be characterized as proteins related to SEQ ID NO: 14 which have as little 
as 40% identity over at least 70 residues to SEQ ID NO: 14. The "variant language" of the 
present claims recites, for example, polynculeo tides encoding "a naturally-occurring amino acid 
sequence having at least 95% sequence identity to the sequence of SEQ ID NO: 14" (note that 
SEQ ED NO: 14 has 585 amino acid residues). This variation is far less than that of all potential 
proteins related to SEQ ED NO: 14, i.e., those proteins having as little as 40% identity over at 
least 70 residues to SEQ ED NO: 14. 

3. The state of the art at the time of the present invention is further advanced 
than at the time of the Lilly and Fiers applications 

In the Lilly case, claims of U.S. Patent No. 4,652,525 were found invalid for failing to 
comply with the written description requirement of 35 U.S.C. § 1 12. The '525 patent claimed the 
benefit of priority of two applications, Application Serial No. 801,343 filed May 27, 1977, and 
Application Serial No. 805,023 filed June 9, 1977. In the Fiers case, party Revel claimed the 
benefit of priority of an Israeli application filed on November 21, 1979. Thus, the written 
description inquiry in those case was based on the state of the art at essentially at the "dark ages" 
of recombinant DNA technology. 

The present application has a priority date of January 7, 1999. Much has happened in the 
development of recombinant DNA technology in the 20 or more years from the time of filing of 
the applications involved in Lilly and Fiers and the present application. For example, the 
technique of polymerase chain reaction (PCR) was invented. Highly efficient cloning and DNA 
sequencing technology has been developed. Large databases of protein and nucleotide sequences 
have been compiled. Much of the raw material of the human and other genomes has been 
sequenced. With these remarkable advances one of skill in the art would recognize that, given 
the sequence information of SEQ ID NO:8 and SEQ ID NO: 14, and the additional extensive 
detail provided by the subject application, the present inventors were in possession of the 
claimed polynucleotide variants at the time of filing of this application. 
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4. Summary 

The Office Action failed to base its written description inquiry "on whatever is now 
claimed." Consequently, the Action did not provide an appropriate analysis of the present claims 
and how they differ from those found not to satisfy the written description requirement in cases 
such as Lilly and Fiers. In particular, the claims of the subject application are fundamentally 
different from those found invalid in Lilly and Fiers. The subject matter of the present claims is 
defined in terms of the chemical structure of SEQ ID NO: 8 or SEQ ED NO: 14. The courts have 
stressed that structural features are important factors to consider in a written description analysis 
of claims to nucleic acids and proteins. In addition, the genus of polypeptides defined by the 
present claims is adequately described, as evidenced by Brenner et al and consideration of the 
claims of the 740 patent involved in Lilly. Furthermore, there have been remarkable advances in 
the state of the art since the Lilly and Fiers cases, and these advances were given no 
consideration whatsoever in the position set forth by the Office Action. 

Appellants therefore submit that the claimed invention, as recited in claim 21, is 
adequately described in the specification that the skilled artisan would recognize applicant's 
possession of said invention, and withdrawal of the rejection of claim 21 under 35 U.S.C. § 1 12, 
first paragraph is requested. 

The rejection of claims 2, 3, 13-15, and 21 under nonstatutory double patenting is 
overcome by the filing of a terminal disclaimer in compliance with 37 CFR 1.321(c) 

The Examiner's rejection of claims 2, 3, 13-15, and 21 under the doctrine of obviousness- 
type double patenting as unpatentable over claims 1, 2, 8-10, and 13 of U.S. Patent No. 
6,566,066 may be overcome by the timely filing of a terminal disclaimer in compliance with 37 
CFR 1.321(c). 

Applicants submit that the cited patent is commonly owned with instant application, and 
that a terminal disclaimer in compliance with 37 CFR 1.321(c) will be filed pending resolution of 
the other outstanding rejections of these claims. 



119272 



10 



09/864,711 



Docket No.: PB-0008-1 CIP 



Due to the urgency of this matter, including its economic and public health implications, 
an expedited review of this appeal is earnestly solicited. 

If the USPTO determines that any additional fees are due, the Commissioner is hereby 
authorized to charge Deposit Account No. 09-0108. 

This brief is enclosed in triplicate 



Customer No.: 27904 

3160 Porter Drive 
Palo Alto, California 94304 
Phone: (650) 855-0555 
Fax: (650) 849-8886 



Enclosures: 

1. Brenner et al., Proc. Natl. Acad. Sci. 95:6073-78 (1998) 



Respectfully submitted, 



INCYTE CORPORATION 



Date: 




David G. Streeter, Ph.D. 
Reg. No. 43,168 

Direct Dial Telephone: (650) 845-5741 
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APPENDIX - CLAIMS ON APPEAL 



2. An isolated polynucleotide comprising a nucleic acid sequence 
selected from SEQ ID NOs: 1 and 8 and the complements thereof. 



3. A composition comprising a polynucleotide of claim 2 and a labeling moiety. 



13. A vector comprising a polynucleotide of claim 2. 



14. A host cell comprising the vector of claim 13. 



15. A method for using a host cell to produce a protein, the method comprising: 

a) culturing the host cell of claim 14 under conditions for expression of the 

protein; and 

b) recovering the protein from cell culture. 



21. An isolated polynucleotide, or the complement thereof, encoding a polypeptide 
having an amino acid sequence of SEQ ID NO: 14 or SEQ ED NO: 15, or a naturally occurring 
variant having at least 95% sequence identity to the amino acid sequence of SEQ ED NO: 14. 
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( 0). and Henikoff and Henikoff (11) i^SeTE 
effecwveness of BLAST and fasta. TTieiTL! w,T b JS 
considered the abfliry to detea homologs ^ * predete7 
mmed score but had no penalty for methods wff a £ 
reported large numbers of spurious matches The wlniirnf^ 
searched the sw.ss.pkot daubase (12) £ u^^Jffig 
to define homologous families. Their results showed dJt £! 
blosum« matrix (14) performed markedly b«er ,h« S e 

bTenpi pAM - ,eries ™ Ttc " <»>• ^ v£SSH* 

A crucial aspect of any assessment is the data that are 

Jff « ",7*™ ,0 ftnd hom*oB BuTta 
Pearson s and the Henikoffs" evaluations of seouence L 

1" — ^"T-ble^and meaLTo? 

example, mat new methods would h» * 

identifying homologs -SEcto JES?!"" 
uating alignment accuracy rather than homology detecti™ 

2-*- bnta. S&,Sr?iSESS. t 

rcf. 24) Hownir * ( * . 2 40(1 ref «ences in 

S 2 a to d7.e^°,i'T ri8 ° rOU$ - 24" 

superior ^ de8Tee 10 Which Such «"k»gs are 

apparent that comparing structures is a more powe fuKi ' 1 2 

SSSLZ l ° reCOgni * diS,am ^'"'^ "if ot 
snips than comparing sequences. If two proteins show a hi.h 
degree of similarity in their structura) dem and 
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limitations. Wtoih&tot?^™?™,™™™^™* 

large majorirv 7.^1^? ^T* hom °'°^ *e 
These superfamUiet ^ch^f,^ de "rmmed unambiguously. 

databases- One ifdbmmi. h«!? ' ( . > nd CTealed ■*© 

leve. of identity iTd? ,2Sed^- 
repeated umil me im waV ^nrv^ m ' ^ procas 
contains 1J23 doni whichTveX'^l 
distant relationships, or -0J% of r . ■ ,°, rd .? red P 3 " 5 of 
pairs. In pdwm th^ ■» mo a WtaI 1 - 749 - 006 °r°«ed 
ships. reprS^ t ^^ n domai, J s have 53.98S relation- 

masked in both datable £ n ^ ^ * COrCS - *° ,hese were 
(27) usine recommLn^f * procesw, S «ne seo program 

^"Tspa^ 

«•/. and daiabases derivVd KiT htt P : //*"«»»ford.edu/ 
may be found X^^SSSSZ Sft? ^ 

Analyses from both databases were .«V™n„ pA . 
PDB40D-B focuses on distantr7rel«!H 8enera "y co "*»«ent. but 

general wpeci the trends to be 

scores, from best to worst. The ideal method would nm 
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SmttMKtwnrwn Sco r ing Schmt (POB400*) 




hoc. NatL Acad. Set. USA 95 ( 




?. F ^sE€S —••-I 

j» length for 10 < / < 8o ; H > idb for < W H . "/ , he qUery and P"»«ns The H*, ~ ^. ° " Umber of iden "»' 

«hc al Ig „ ment minus H . s muh _w ater '°;; ^.°^ r - It]' 0 ''* 80 The Percent,^ .d«~ ^^*^ ' ,7 > » H - 290.1S/-«*= w he " 

„„, ilUeS W " e Uken «"««'> Torn lne scqu„c CO m°/ r " "* ,dem "> "'*» 

perfect separation, with all of the homo.™ , ' <°«P»™ prog,,*, 



•ion is impossible u ^^£Z^-^ tea ' e f m - 
towing a threshold above whict he " re the ? ' n ' ereS,ed in 
coated pa.rs of seances «o^j£ tf~ 

-rage vs. error .^^^^^-J 




c-er Operattng a^&^Sgg 0 * ""/"'V' 
better represent the h.eh d« re es S L? ( "' bu ' 
sequence compartson and tZf l ' ■«uracy requ.red in 
moiogs. P " d ,he hu ? e background of nonho- 

'^^S^S^ h . din ^ re,e ™ » Poetic, 
information neSmoirfSJ, 0 .' r", IT preC,se, >' ,he 
search. The EPQ m Lurfp ac« a eSeqUenCe da,abase 
«Mcy: that is. i, reou.res core V?o L prem,um ° n score "nsu- 
,uer,es. ^^^^^^ 



" o'^r..^, o,*^, (PDB90D-B) 



W NU T . MM j oc ^ ^ 



. wet«ku> 

(PD. code |nn|. ref. 39. kJ,) nav 3<£ „ fi) "* ee " ulas<: E 

level which is often belie»di« i? - n '" V over 64 "«°"« a 

h.gh degree of iden^ « ~ he r «,!T c , ,Ca " Ve °' h ° m ° ,0 « v D «P» e < h " 

pro.e,r/,re noStlT * l "°!! gly «"«' 

"ore of 85 nor the E^o^? "^"fV' *** * i ^ muu 
Rasmol (40). ngnificam. Proieini rendered by 




100 

Alignment fenpth 

ssearch 15 oioi.ea as » » nonnomolo ?'>« prote.nj found w„h 

«»« Pe««.« .LntHvT B rrAT'°" ' nd,Ca,eS ' en ? ,h " d 
icngm ano peVcema.e « tn Z „ T^ZTL 
may have exaetiv the same aliinmtn, " P " rs °' P' 01 *'" 

The Ime shows the HSSMhreshoM ngm 'den..rv. 
» differen, *Z Z pZw " " "" ended '° " 
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AtftaMUryoftelMcal 



(PDBMD4) 




.hf !^'. 4 ; R ?! i * ai * of mimical scores in rowoD-B: Each line shows 
the relationship between reported statistical score and actual eZZ 
rate for a dtfferem program. Rvalues are rcporS for ssI^ch ^ 
facta, whereas P-vaiue* are shown for blast and w™uu£r If tSf 

score. The results tor rotWD-B were similar to those for pdkoc-r 
despite the difference in number of homologs detect TJuTS 
could be used to roughly calibrate ,he relubiSv of a gtver 

f reVi ° US ,eS,$ bm iS Kenlial for ,h < «raiehrforward 
fJSST* "VP™** oi $e «J«nee comparison resuta 

fhil J?^'^ ' dMr indica,ion of the confidence "ha 
should be ascribed 10 each match. Indeed, the EPQ measure 
should approx.mate the expectation value reported b* 4™. 
base searching programs, if the programs' estimates are accu- 

JOS Per f» nBa,, « «f Scoring Schemes. All of the programs 
tested could prov.de three fundamental types of scores T^e 
m^vSf * U,e h '«r«Me identity, which mav be c^puSd 

"Wrlv^, - Se< * uences 11,6 *<=ond is a "raw" or 
he Smith-Waterman algorithm and is computed bv summing 

men! U S ,U,, °K n ^ K ° m '° f " ch P"**» in the ahgn? 
ment and subtractmg gap penalties. In blast, a measure 



Sm»mne» Comparison AJpom nnu (POB4004) 
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are summanS I F 1gT ^ dWnbuUon - ™- 
BerS!!! 0 * 1 A tBtitf - '" ,0U8h " has •»«" '°n« published that 

proteins with verv hot. *' " s onc of thc "»">' P ai " of 

el iab1.^ CU 1° Ch °° Se a PP W 'hresholds becauTeThe 

St C. y T a,m and W P a "meters. 
Statistical Scows. Stattstical scores were tntroduced partly 
to overcome the problems that arise from raw scores t£ « 

homoL Seheme Pr0VidM ,he beSl b«w?en 
homologous proie.ni and those wh.ch are unrela.ed Most 



S#ouenc * Comoanaon AJoorflhms (PDB90O-B) 
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J* substitution „Tp? SJflK rVwSf °' ,he 

detail, about the sequence ^* COres) buI al *> has 

scaJed apprepnaie^ ' en8,hi Md ^position and is 

between s,.^ tco^eaT and anuaT numK C ' 0S ^ V-wh 
query (Fig 4) The . numbe r of errors per 

quenees being found at Chances of to se- 

E-va.ueofO.o1 Sfij£ ^ o^ gh " onSr^ 8n 

» *is way. "dTes'^rt^te"" ^"^5 
extreme value distribution tZ 1 i * e suitability of the 
database search deSCnb,n * *« «ores from a 

a »et,^ 

orders of magnitude for 1 £ EPO fe i b l more ,han «*«> 
less, these refults strong, . JZ1 l?^ 8 "- N ° ne,he - 
fundamentally aroropriaie vSI i*^ 8n8,vl,e ,neorv " 
liable than those froml^:^ 560 ^ were more re- 
confidence bv more than a?L« «f exa » era,e expected 
rf O-erall Detection 2 HomoE 2 - "g ,nude . 81 »* EPQ. 
rithms. The results in FuUmEEi <?»P«rison of Algc- 
«quence comparison hc^i'S^g™ ,h " P 8 "^ 
fracnon of the homologo"X R ° 0 V $ e Gu ^" g 0niy 8 ^ 
Even ssearch with E^luL th?K. . q ences * "»«d-b. 
find only 18% of aJ , relaShi^ a ^a PPn' ^ cou,d 

slower than facta letup =7 ^ 8nd 6 5 times 

Facta letup = 2 . but the taiier^iftJ" " Sl,gh " y fas,er ,han 
In PDB90D-B. where there ale lTv r ? m,M P r " 8ble «««■ 
best method can idemif on'v «S " ^""hips. the 
homologs (Fig. 55) The m«h„2 °' """Orally known 
relationships J wu£J£ S TmU ,hal ma «v 
difference! Tbe,we7n ^kiSWES&T ^ tha ' ,he 
programs are unlikely to be ^Lif,^ 5"' 8nd wu -B1act: 
variation in datable co^si Sn ^' ^ COm *» nd 
Fig. 6 helps to expfain wr mo !^ , *Z m * r ' iia M» 
found by sequenMcomMris^ L nomo,0 8s «nnot be 
ships have no more SK^'iT """M" 6 " rda,ion - 
by chance, ssearch with p L " ,han W0u,d be expected 
homologous |?^1n53SK^*» >*>*of the 
are 30 pairs of homology ^3n*l«^ re / ,on - ,here 
icant E-values but 2*. «f ,1 " aI do not have sienif- 

residues. Of sejuenis hS T ^J""" 5 wilh <*> 
identified bv ssearchE vifh.l/u 30 * ,demi, >- are 

falls off sharpl* on^W h( ^„ "'W- ,he de,e « i °" 
' • of homologs with 20-25% identity 




identity , using ,„ e me« u ,nfS^ ,, L fM * ,M according ,o 

(sseajich with E-vHucs) ., 1% ETO ^ , " bue,M '«>'ng method 
Proteins with <409r idenuiv^' da,4b *«« contains 

T£T? y ideB,ified on J h " » riph - «S 

tremelv f„ in seouence ,„ d heie ■ I"' h,vt d,ve '» e «> «»• 
al-gnmenismavbemKcuriuesDed,,^ '^ W " > ' No,e ">« ">« 
W shew "... ssearcti can^o^ofv mlsT i*^ °' ^entity. Filled 
-S« or more .denuty. but IU m ™ 1 ,el »».on.hip$ that have 

Consequently, the great seoueZ 1 anes "^V? ^o- X* 
'"entiCed evolut.on,^ rel ",^" e L d '7'P»« of most mucturaHv 
Pa-« seouence co^^™™^ -he abiHry of 

proteins whose identirv is re 'LZ 8 find «'ated 

of the method is j!^'- the pown 

protein sequences. ' " e greal Mergence of many 

After completion of this work » » 
BUST was released: Bi^op %7> I, Vm ' 0n of P airw « e 
ments. like wu-blast- Vnd rfii ' su PPons gapped align- 
initial tests on buSS S fST 11 " wi,h sum statistics. Our 
E-values are.re.i^an d Ta ifste^r Sh ° W ,ha ' « 
was substantially better than ,h a f „f ' eC "° n of homologs 
qtme equal to that of w?., ^Sl UnS8PPed ^ bu « no ' 



CONCLUSION 




and usmg statBttcal "o« IO com P'"»V masked 

experiments f ull> supooruh.s v ,e° ^ fe ' r " U,ls ° ur 

Our results also sucpesi twn f,.r.k 
reported b. FA ST.3 ^ s ^ ,he 
esnmaies of the sienif.cance ot g , ' fa,r,y accura * 

wl.blast; underesnmate the true 
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«tem of errors. Second, ssearch wu-ri a cm „ 
hup - l perform best O^S^S^^Z^ 
detect most of the relationsh^fc^ ^ £J JS, 2 
»d .re appropriate for r^id tai JWrcfi ? 

«*rchmg procedures tested fail to find the lar^m^l^ , 
d«t»t evoJutiowry relation^ at « J2 e £bte^2.° f 
Thus, if the procedures uu«hI h-~ , ™ pl * DIC erTOr rate. 

* fe^2^'5»iSS-y-. P 4 MuRin, a. a. 
" STS£.U?-** J - 0 -<""l'>~«-«ii i : KM 
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